# required packages to run visNetwork examples
library(visNetwork)
library(igraph)
library(dplyr)

Overview

uttR follows a pipeline consisting of 3 steps:

  1. Define the model (without a data set or outcome)
  2. Fit the model (apply to a data set and decide the framework)
  3. Conduct follow-up or create output
set.seed(5)
nodes <- data.frame(id = c("make_model", "fit_model", "follow_up"))
edges <- data.frame(from = c("make_model", "fit_model"), 
                    to = c("fit_model", "follow_up"))
nodes$label <- c("make_model", "fit_model", "follow_up")
nodes$group <- c("make", "fit", "followup")
nodes$level <- c(1, 2, 3)
visNetwork(nodes, edges) %>%
  visEdges(arrows = "to") %>%
  visHierarchicalLayout(direction = "LR")

There are main functions that carry out each of these steps. A user begins by making a model (this is carried out through the make_distribution() functions) and defining any priors. Next, users fit the model using the fit_model() function, which deploys S3 methods to carry out the fitting. Lastly, the user can conduct any follow-up analysis. Currently supported follow-up procedures include prediction, simulation (frequentist only), and graphing (frequentist only).

set.seed(5)
nodes <- data.frame(id = c("make_model", "set_priors", "fit_model", "do_simulation", "do_prediction", "graph_distribution", "combine_plots"))
edges <- data.frame(from = c("make_model", "set_priors", "make_model", "fit_model", "fit_model", "fit_model", "graph_distribution"), to = c("set_priors", "fit_model", "fit_model", "do_simulation", "do_prediction", "graph_distribution", "combine_plots"))
nodes$label <- c("make_model", "set_priors", "fit_model", "do_simulation", "do_prediction", "graph_distribution", "combine_plots")
nodes$group <- c("make", "make", "fit", "followup", "followup", "followup", "followup")
nodes$level <- c(1, 1, 2, 3, 3, 3, 4)
nodes$x <- c(300, 380, 300, 200, 300, 400, 400)
nodes$y <- c(0, 250, 500, 1000, 1000, 1000, 1500)

visNetwork(nodes, edges) %>%
  visEdges(arrows = "to") %>%
  visIgraphLayout(layout = "layout_nicely")

Step 1 - Make Model

make_distribution

Below is a diagram listing the functions provided for the steps in creating a model. These functions all take in a list of predictors (as well as a name of a distribution for make_distribution()) and return a tibble containing the name of the distribution and the right hand side of a model equation which has been defined as a linear combination of the given predictors.

Note: There is no make_model() function - in this diagram it is simply used as a reference to connect the make_distribution() functions to their respective parts within the uttR framework.

set.seed(5)
nodes <- data.frame(id = c("make_model", "make_binom", "make_pois", 
                           "make_distribution", "make_negbinom", "make_betabinom"),
                    label = c("make_model", "make_binom", "make_pois",
                              "make_distribution", "make_negbinom", "make_betabinom"),
                    level = c(2, rep(1, 5)))
edges <- data.frame(from = rep("make_model", 5), to = nodes$id[-1])
visNetwork(nodes, edges) %>%
  visHierarchicalLayout()

set_priors()

If a user will be fitting a Bayesian model they must also set the priors for each predictor in the model, including the intercept. This is done using the set_priors() function. The function takes in a comma separated list of expressions in the form of variable = prior where the prior is written in JAGS notation.

set_priors():

set_individual_priors():

nodes = data.frame(id = c("set_priors", "set_individual_priors", "get_predictors"),
                   label = c("set_priors", "set_individual_priors", "get_predictors"),
                   level = c(1, 2, 3))
edges = data.frame(from = c("set_priors", "set_individual_priors"),
                   to = c("set_individual_priors", "get_predictors"))
visNetwork(nodes, edges) %>%
  visEdges(arrows = "to") %>%
  visHierarchicalLayout(direction = "LR")

Step 2 - fit_model

The next step of the pipeline is to fit the specified model(s). The fit_model() function takes in a tibble returned from the make_model step and fits the specified model(s) using classed methods.

Class hierarchy is as follows:

set.seed(5)
nodes <- data.frame(id = c("Binomial", "Binomial.Frequentist", "Binomial.Bayesian", "Binomial.Randomforest", "Poisson", "Poisson.Frequentist", "BetaBinomial", "BetaBinomial.Frequentist", "NegativeBinomial", "NegativeBinomial.Frequentist"),
                    label = c("Binomial", "Binomial.Frequentist", "Binomial.Bayesian", "Binomial.Randomforest", "Poisson", "Poisson.Frequentist", "BetaBinomial", "BetaBinomial.Frequentist", "NegativeBinomial", "NegativeBinomial.Frequentist"),
                    group = c(1, 1, 1, 1, 2, 2, 3, 3, 4, 4),
                    level = c(1, 2, 2, 2, 1, 2, 1, 2, 1, 2))
edges <- data.frame(from = c("Binomial", "Binomial", "Binomial", "Poisson", "BetaBinomial", "NegativeBinomial"),
                    to = c("Binomial.Frequentist", "Binomial.Bayesian", "Binomial.Randomforest", "Poisson.Frequentist", "BetaBinomial.Frequentist", "NegativeBinomial.Frequentist"))
visNetwork(nodes, edges) %>%
  visEdges(arrows = "to") %>%
  visHierarchicalLayout()

Once fit_model() is called it goes through 2 main steps:

  1. Construct the object
  2. Fit the object

Construct the object

To construct the object that will be fit, a constructor is called. This is also where model options are set. The specific constructors search for keywords in the output from the model_options() function. The constructor then adds the options to the model options by searching for distribution specific names within the list returned from model_options().

model_options():

distribution.fit - S3 class constructor:

nodes = data.frame(id = c(1, 2),
                   label = c("model_options", "constructor"),
                   level = c(1, 2))
edges = data.frame(from = c(1),
                   to = c(2))
visNetwork(nodes, edges) %>%
  visIgraphLayout(layout = "layout_nicely") %>%
  visEdges(arrows = "to") %>%
  visHierarchicalLayout(direction = "LR")

Fit Object

Once the object has been constructed, the object is fit using the associated S3 method for fit_object(). Additionally, the result of the fit gets developed into an S3 class called model_results using the as_result() function.

fit_object.distribution.Frequentist():

fit_object.distribution.Bayesian():

fit_object.distribution.Randomforest():

create_jags_code():

The workflow for the fit_object() function is as follows:

set.seed(30)
nodes <- data.frame(id = c("fit_object", "as_result", "get_predictors_randomforest", "create_jags_code", "make_prior", "make_likelihood", "get_predictors_bayesian"),
                    label = c("fit_object", "as_result", "get_predictors", "create_jags_code", "make_prior", "make_likelihood", "get_predictors"),
                    group = c("All", "All", "Random Forest", "Bayesian", "Bayesian", "Bayesian", "Bayesian"),
                    x = c(250, 100, 300, 600, 500, 700, 900),
                    y = c(0, 100, 200, 200, 400, 500, 550))
edges <- data.frame(from = c("fit_object", "fit_object", "create_jags_code", "create_jags_code", "make_likelihood", "fit_object"), 
                    to = c("as_result", "create_jags_code", "make_prior", "make_likelihood", "get_predictors_bayesian", "get_predictors_randomforest"))
visNetwork(nodes, edges) %>%
  visIgraphLayout(layout = "layout_nicely") %>%
  visEdges(arrows = "to") %>%
  visLegend

Putting model construction and fit together

The final result is a function call that follows the following workflow - beginning with fit_model():

nodes <- data.frame(id = 1:10,
                    label = c("fit_model", "constructor", "model_options", "fit_object", "as_result", "get_predictors", "create_jags_code", "make_prior", "make_likelihood", "get_predictors"),
                    group = c("All", "All", "All", "All", "All", "Random Forest", "Bayesian", "Bayesian", "Bayesian", "Bayesian"),
                    x = c(500, 0, 150, 500, 700, 250, 500, 700, 300, 0),
                    y = c(0, 0, 150, 300, 300, 300, 500, 700, 700, 700))
edges <- data.frame(from = c(1, 1, 1, 4, 4, 4, 7, 7, 9),
                    to = c(3, 2, 4, 5, 6, 7, 8, 9, 10))
visNetwork(nodes, edges) %>%
  visIgraphLayout(layout = "layout_nicely") %>%
  visEdges(arrows = "to") %>%
  visLegend()

Step 3 - Follow-up analysis

The final step within the pipeline is to complete any follow-up analysis. These functions take in the results of fit_model() and any additional options.

The supported follow-up functions are:

do_prediction

The do_prediction() function will predict values based on the fit model(s). It will either predict the data set that the model was fit to, or a new data set supplied by the user.

Methods implemented for do_prediction():

Note: Although the link function for the inverse transformation is the same within each type of model (binomial, Poisson, negative binomial, and beta binomial), the way that each value is returned from model_prediction() may differ, making the method specific to not only the distribution but the fit type.

do_prediction():

model_prediction.distribution.fit:

inv_transformation.distribution.fit:

The workflow for do_prediction() is as follows:

set.seed(5)
nodes <- data.frame(id = 1:5,
                    label = c("do_prediction", "constructor", "model_prediction", "get_predictors", "inv_transformation"), 
                    group = c("General Function", "Method", "Method", "General Function", "Method"),
                    x = c(500, 0, 500, 500, 1000),
                    y = c(0, 300, 300, 600, 300))
edges <- data.frame(from = c(1, 1, 3, 1),
                    to = c(2, 3, 4, 5))
visNetwork(nodes, edges) %>%
  visIgraphLayout() %>%
  visLegend %>%
  visEdges(arrows = "to")

do_simulation

Users can also simulate data based on their fit model(s). This simulation occurs by first predicting the values, then simulating from the appropriate distribution.

Methods implemented for do_simulation():

do_simulation():

simulate_distribution.distribution.fit:

The workflow for do_simulation() is as follows:

nodes <- data.frame(id = 1:5,
                    label = c("do_simulation", "constructor", "model_prediction" ,"get_predictors", "simulate_distribution"),
                    group = c("General Function", "Method", "Method", "General Function", "Method"),
                    x = c(500, 0, 500, 500, 1000),
                    y = c(0, 300, 300, 600, 300))
edges <- data.frame(from = c(1, 1, 3, 1),
                    to = c(2, 3, 4, 5))
visNetwork(nodes, edges) %>%
  visIgraphLayout() %>%
  visEdges(arrows = "to") %>%
  visLegend

Graphing

The final supported process for the pipeline is graphing. Graphing can be done in two steps - graphing each distribution individually, and graphing all distributions overlaid onto one graph. If a grouping variable is set, then separate graphs are made for each group when graphing the distributions together. Additionally, there is an option for a histogram of the original data to be displayed behind the distributions.

The function plot_distribution() is called first to graph each distribution separately. If the user then wants to combine all of the plots, the combine_plots() function takes in the output from plot_distribution() and returns a plot with all of the distributions.

Note: If a user wants a histogram to be displayed on their combined plot it must also be included in their individual plots.

plot_distribution():

combine_plots():

model_distribution.distribution.fit:

The workflow for plot_distribution() is as follows:

set.seed(5)
nodes <- data.frame(id = 1:3,
                    label = c("plot_distribution", "constructor", "model_distribution"),
                    group = c("General Function", "Method", "Method"),
                    level = c(1, 2, 2))
edges <- data.frame(from = c(1, 1),
                    to = c(2, 3))
visNetwork(nodes, edges) %>%
  visIgraphLayout() %>%
  visEdges(arrows = "to") %>%
  visHierarchicalLayout() %>%
  visLegend()

Final Workflow

Now that each piece has been discussed individually, the final workflow is as follows :

nodes <- data.frame(id = 1:31,
                    label = c("make_binom", "make_pois", "make_betabinom", "make_negbinom",
                              "make_distribution", "set_priors", "set_individual_priors",
                              "fit_model", "constructor", "model_options", "fit_object",
                              "get_predictors", "create_jags_code", "make_prior", "make_likelihood",
                              "get_predictors", "as_result", "do_prediction", "constructor",
                              "model_prediction", "get_predictors", "inv_transformation", 
                              "do_simulation", "constructor", "model_prediction", "get_predictors",
                              "simulate_distribution", "plot_distribution", "constructor",
                              "model_distribution", "combine_plots"),
                    group = c("Function", "Function", "Function", "Function", "Not included",
                              "Function", "Function", "Function", "Method", "Function", "Method",
                              "Function", "Function", "Function", "Function", "Function", "Function",
                              "Function", "Method", "Method", "Function", "Method", "Function",
                              "Method", "Method", "Function", "Method", "Function", "Method",
                              "Method", "Method"), 
                    level = c(1, 1, 1, 1, 2, 2, 2, 
                              3, 2, 3, 4, 5, 5, 6, 6, 7, 5,
                              8, 9, 9, 10, 9,
                              8, 9, 9, 10, 9,
                              8, 9, 9, 11))
edges <- data.frame(from = c(1:4, 5, 6, 6, 5,
                              rep(8, 3), 11, 11, 13, 13, 15, 11,
                             8, 18, 18, 20, 18,
                             8, 23, 23, 25, 23,
                             8, 28, 28, 28),
                    to = c(rep(5, 4), 6, 7, 8, 8,
                           9:11, 12, 13, 14, 15, 16, 17,
                           18, 19, 20, 21, 22,
                           23, 24, 25, 26, 27,
                           28, 29, 30, 31))
visNetwork(nodes, edges) %>%
  visIgraphLayout() %>%
  visHierarchicalLayout() %>%
  visLegend() %>%
  visEdges(width = 3) %>%
  visNodes(font = list(size = 25)) %>%
  visInteraction(zoomView = TRUE, navigationButtons = TRUE)


bprucka/uttr documentation built on May 27, 2019, 11:54 a.m.